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Abstract In this paper, we establish the convergence of the proximal alternating direction method of multipliers (ADMM) 
and block coordinate descent (BCD) for nonseparable minimization models with quadratic coupling terms. The novel 
convergence results presented in this paper answer several open questions that have been the subject of considerable 
discussion. We firstly extend the 2-block proximal ADMM to linearly constrained convex optimization with a coupled 
quadratic objective function, an area where theoretical understanding is currently lacking, and prove that the sequence 
generated by the proximal ADMM converges in point-wise manner to a primal-dual solution pair. Moreover, we ap¬ 
ply randomly permuted ADMM (RPADMM) to nonseparable multi-block convex optimization, and prove its expected 
convergence for a class of nonseparable quadratic programming problems. When the linear constraint vanishes, the 2- 
block proximal ADMM and RPADMM reduce to the 2-block cyclic proximal BCD method and randomly permuted 
BCD (RPBCD). Our study provides the first iterate convergence result for 2-block cyclic proximal BCD without assum¬ 
ing the boundedness of the iterates. We also theoretically establish the expected iterate convergence result concerning 
multi-block RPBCD for convex quadratic optimization. In addition, we demonstrate that RPBCD may have a worse 
convergence rate than cyclic proximal BCD for 2-block convex quadratic minimization problems. Although the results 
on RPADMM and RPBCD are restricted to quadratic minimization models, they provide some interesting insights: 1) 
random permutation makes ADMM and BCD more robust for multi-block convex minimization problems; 2) cyclic 
BCD may outperform RPBCD for “nice” problems, and therefore RPBCD should be applied with caution when solving 
general convex optimization problems. 
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1 Introduction 


In this paper we consider the linearly constrained convex minimization model with an objective function that is the sum 
of several separable functions and a coupled quadratic function: 


^ 1 

min 9{x) := > dAxA + —x Hx + gx 

n ( 1 ) 

s.t. AiXi = b, 

i=l 


where 9i : i-i- (—oo, +oo] (i = 1,2,... ,n) are closed proper convex (not necessarily smooth) functions; Xi G 

, X = (* 1 , 2 : 2 ,..., Xn) £ K'*; H G is a symmetric and positive semidefinite matrix; g G Ai G and 

h G K™. A point (i, ft) is said to be a Karush-Kuhn-Tucker (KKT) point of (1) if it satisfies 

i-{Hx + g)i + Ajg.ed9i{xi), i = l,---,n, 

1 Yli=l 

The set consisting of the KKT points of (1) is assumed to be nonempty. Problem (1) has many applications in signal and 
imaging processing, machine learning, statistics, and engineering; e.g., see [1, 14,19,29,41,42]. 

The augmented Lagrangian function of (1) is 

£^( 2 : 1 , ...,Xn\g) ■■= 'Y^9i{xi) + -x^ Hx + g^x - {"^AiXi - 6) + ^|| "^A^Xi - fe|| , (3) 

2=1 2=1 2=1 


where g G is the Lagrangian multiplier and /3 > 0 is the penalty parameter. In this paper, we extend the n-block 
proximal alternating direction method of multipliers (ADMM) to solve the nonseparable convex minimization problem 
(1), which consists of a cyclic update of the primal variables 2 ;j (i = 1, 2,..., n) in the Gauss-Seidel fashion and a dual 
ascent type update of g at each iteration, i.e., 

2:1 + ^ := argmin|£^(2;i,2;2,.. .,xt\g^) + \ \\xi -x'lW^^X, 

a:*+^ := argmin 2:2,2:3,..., 2:^; + i||2:2 -xlfnA, 

a;2GK‘^2 / J 

< . (4) 

:= . ,x^Vi,Xn\ -Xn\\R„\, 

n 

2=1 

where G * = 1, • ■ • , n, are symmetric and positive semidefinite matrices. 

Note that the algorithmic scheme (4) reduces to the classical ADMM when there are only two blocks (n = 2), the 
coupled objective vanishes (H = 0 and g = 0) and Ri = 0 {i = 1,2). ADMM was originally introduced in the early 
1970s [20,23], and its convergence propertites have been studied extensively in the literature [6, 15,17,18,22,28,40]. 
Because of its wide versatility and applicability in multiple fields, ADMM is a popular means of solving optimization 
problems, especially those related to big data; we refer to [8] for a survey on the modem applications of ADMM. 

For the case of n > 3, numerous research efforts have been devoted to analyzing the convergence of multi-block 
ADMM and its variants for the linearly constrained separable convex optimization model, i.e., (1) without the coupled 
term. Recent work [10] has shown that the n-block ADMM (4) is not necessarily convergent, even for a nonsingular 
square system of linear equations. Various methods have been proposed to overcome the divergence issue of multi-block 
ADMM. One typical solution is to combine correction steps with the output of n-block ADMM (4) [25-27]. If at least 
n — 2 functions in the objective are strongly convex, it has been shown that (4) is globally convergent, provided that the 
penalty parameter j3 is restricted to a specific range [9, 11,24,33,38,52]. Without strong convexity, it has been shown [30] 
that the n-block ADMM with a small dual stepsize, where the multiplier update (4) is replaced by 

n 

^k+i = - b), 

2=1 
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is linearly convergent provided that the objective function satisfies certain error bound conditions. Some very recent stud¬ 
ies [36,37] have demonstrated the convergence of multi-block ADMM under some other conditions, and some convergent 
proximal variants of the multi-block ADMM have been proposed for solving convex linear/quadratic conic programming 
problems [13, 35, 47]. A recent paper [48] proposed a randomly modified variant of the multi-block ADMM (4), called 
randomly permuted ADMM (RPADMM). At each step, RPADMM forms a random permutation of {1, 2,..., n} (known 
as block sampling without replacement), and updates the primal variables Xi{i = 1, 2,..., n) in the order of the cho¬ 
sen permutation followed by the regular multiplier update. Surprisingly, RPADMM is convergent in expectation for any 
nonsingular square system of linear equations [48]. 

In contrast to the separable case, studies on the convergence properties of n-block ADMM for (1) with nonseparable 
objective, even for n = 2, are limited. In [29], the authors demonstrated that when problem (1) is convex but not neces¬ 
sarily separable^ and certain error bound conditions are satisfied, the ADMM iteration converges to some primal-dual 
optimal solution, provided that the stepsize in the update of the multiplier is sufficiently small. Despite this conserva¬ 
tive nature, the stepsize usually depends on some unknown parameters associated with the error bound, and may thus 
be difficult to compute, which often makes the algorithm less efficient. In view of this, it might be more beneficial to 
employ the classical ADMM (4) (with t = 1) or its variants with a large stepsize r > 1. However, as mentioned in [31], 
“when the ohjective function is not separable across the variahles, the convergence of the ADMM (4) is still open, 
even in the case where n = 2 and 6{-) is convex.” Along slightly different lines, [14] investigated the convergence of 
a majorized ADMM for the convex optimization problem with a coupled smooth objective function, which includes the 
2-block ADMM (4) for (1) as a special case. Convergence was established for the case when the subproblems of the 
ADMM admit unique solutions and H, Ai, A 2 , Ri and R 2 satisfy some additional restrictions; see Remark 4.2 in [14] 
for details. Very recently, [21] studied the convergence and ergodic complexity of a 2-block proximal ADMM and its 
variants for the nonseparable convex optimization by assuming some additional conditions on the problem data. As the 
positive definite proximal terms are indispensable in the analysis of these algorithms, the results derived in [21] are not 
applicable to the scheme (4) for problem (1) since Ri and R 2 are only positive semidefinite. 

In this paper, we analyze the iterate convergence of proximal ADMM (4) and the randomly permuted ADMM for 
solving the nonseparable convex optimization problem (1). The main contributions of our paper are threefold. Firstly, 
we prove that the 2-block proximal ADMM is convergent for (1) only under a condition that ensures the subproblems 
have unique solutions. Our condition is the weakest to ensure iterate convergence for the proximal ADMM since, as we 
will see in Section 2, it is not only sufficient but also necessary for the convergence of the proximal ADMM applied 
to some special problems. Our analysis partially answers the open question mentioned in [31] on the convergence of 
ADMM for nonseparable convex optimization problems. Secondly, we extend the RPADMM proposed in [48] to solve 
the model (1), and prove its expected convergence in the case where Oi = 0 (i = 1, 2,..., n). This result is a non-trivial 
extension of the convergence result shown in [48], since the objective in (1) is more general and its solution set may 
not be a singleton. Thirdly, when restricted to the unconstrained case, that is, Ai {i = 1, ■ • • ,n) and b are absent, the 
proximal ADMM and RPADMM reduce to the cyclic proximal block coordinate descent (BCD) method (also known as 
the alternating minimization method), i.e., 

a.Tgmme{xi,X2,...,Xn) + ^\\xi - Xx\\\^, 

&rgmme{x'l'^^,X2,X3,.. .,xt) + ^\\x2 - X2 \\r^, 

a:2GR'^2 ^ (5) 

. Q/ k+1 k+1 k+1 \ I 1 II fe||2 j 

argmm6»(a;i^ ,X2^ ,... ,x„Zi,Xn) +-;z\\xn - x^Wr,,]- 

and randomly permuted BCD. An implication of our work is the iterate convergence of the 2-block cyclic proximal BCD 
method for the whole sequence and, in particular, the expected convergence of randomly permuted multi-block BCD. 
Although the literature on BCD-type methods is vast (e.g., [3-5,39,43,45,46,49,50]), there are very few results on the 
iterate convergence of BCD-type methods. As mentioned in [7], “in all these works [on BCD or its proximal variants] 
only convergence of the subsequences can he estahlished.” By assuming that the Kurdyka-Lojasiewicz property holds 
on the objective function and the iterates are bounded, [2] and [7] established the iterate convergence of the proximal 
BCD and proximal alternating linearized minimization, respectively. It is clear that these results are also applicable to the 
BCD type methods for convex minimization problems. While the boundedness assumption of the sequence are typical 

* The models considered in [29,31] are more general than problem (1), as the authors of [29,31] actually allow generally nonseparable 
smooth function in the objective, but in (1) the coupled objective is a quadratic function. 


k+1 _ 


k+1 

X2 ■■ = 


k+1 _ 
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to establish the iterate convergence of algorithms for nonconvex optimization problems, it might be a bit restrictive to 
assume the boundedness for analyzing the iterate convergence for the convex cases. To the best of our knowledge, our 
convergence result for the 2-block proximal BCD method is the first for the proximal BCD that only requires the unique 
solutions-type condition of the subproblems, rather than any assumptions on the boundedness of the iterates. 

It has been claimed that randomly permuted BCD (RPBCD, also known as the “sampling without replacement” vari¬ 
ant of randomized BCD, and called “EPOCHS” in a recent survey [51]) tends to converge faster than the randomized 
BCD [51] , with the classical cyclic version performing even worse. Some numerical advantages of RPBCD compared 
with randomized BCD and cyclic BCD were discussed in [45]. In fact, it has been stated that “this kind of randomiza¬ 
tion [RPBCD] has been shown in several contexts to be superior to the sampling with replacement scheme analyzed 
above, but a theoretical understanding of this phenomenon remains elusive” [51]. Randomized BCD (“sampling with 
replacemen”) has already been extensively studied [44], but its theoretical analysis does not apply to RPBCD. Although 
the function value convergence results [4,32,49] for cyclic or essential cyclic BCD can be simply extended to RPBCD, 
these analysis techniques are independent of permutation, so there remains a lack of direct theoretical analysis on the 
iterate convergence of RPBCD. Our expected iterate convergence of RPBCD for quadratic minimization problems can be 
regarded as the first direct analysis on the iterate convergence of the “sampling without replacement” variant of random¬ 
ized BCD. We also prove that RPBCD may have a worse convergence rate than cyclic BCD for quadratic minimization 
problems. Thus, RPBCD should be used with caution for solving general optimization problems. 

The rest of this paper is organized as follows. In Section 2, we prove the iterate convergence of the 2-block proximal 
ADMM and cyclic BCD for linearly constrained optimization problems with a coupled quadratic objective function 
( 1 ) and its unconstrained variant, respectively. Section 3 illustrates the expected convergence of the RPADMM and the 
RPBCD for a class of linear constrained quadratic optimization problems and its unconstrained variant, respectively. 
Finally, we conclude our paper and present some insights into the use of ADMM and BCD in Section 4. 


2 Convergence of 2-Block Proximal ADMM 

In this section, we will specify n = 2 and analyze the iterate convergence of the 2-block proximal ADMM for the 
convex optimization model (1). For notational simplicity, we write 


H ■= 


Hu Hi2 

HJ2 H22 


R := 


Ri 0 
0 i?2 


and g := 


and define the quadratic function (f>{xi,X 2 ) by 


4>{xi,X2) ■= -xJHiiXi +x{Hi 2X2 + -^xjH22X2 + gi Xl + gjX2- 


Thus the problem under consideration can be written as 


min 9{x) := 6i{xi) + 92{x2) + 4>[xi,X2) 
s.t. A\xi + A 2 X 2 = h. 


( 6 ) 


(7) 


Since 9i and 92 are closed convex functions, there exist two symmetric positive semidefinite matrices Ei and E 2 
such that 


(*1 - Si)^(wi - Wi) > llsi - iillljj, V xi,xi G dom(6li), wi G d9i{xi),wi G d9i{xi) 


and 


{x2 — £2)^ {1x2 — W2) > \\x2 — X2\\s^, V X2,X2 G dom(02), W2 G d 92 {x 2 ),W 2 G 902 (^ 2 ), 
where d9i and 892 are the subdifferential mappings of 9i and 92, respectively. By letting 


Xl 


Xl 


Wl 


Wl ^ 

and E := 

'Ti 

0 ■ 


, X : = 


, w : = 


, w : = 


0 

-1 

(M 

C] 

X2^ 


X2 


W2 


W2 



( 8 ) 

(9) 

( 10 ) 


{x — x)^ {w — w) > \\x — x\\\;. 


we have 


(11) 
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The following lemma establishes the contraction property with respect to the solution set of (7) for the sequence 
generated by (4), which plays an important role in the subsequent analysis. 

Lemma 1 Assume the 2-block proximal ADMM (4) is well defined for problem (7). Let {{xi, be the sequence 

generated by (4). Then, the following statements hold. 

(i) If {xi, X 2 , fi) is any given KKT point of problem (7), then we have 


U 

-(1 

V8 


7,, fc+i 
X 


> —- 
- le" 


- x\\h+i:+±r+^‘^\\h22-\-S2-v0aJA2 ^ ^11 +< 

fc ||2 I 1 II k+1 fc ||2 I 1 II fe +1 fc ||2 

^ + gll®2 -X2\\H22 + S2+tipAfA2 + ^\\h- -T\\- 


( 12 ) 


(ii) It holds that 

' lim d(0, dei{x\-^^)+V:cA{x\+^,xl+^)- aJp^+^) = q, 
k—^oo 

< lim d{0, de2{x^+^) + \7:o2Hx'l+\x^+^)-Ajp.’^+^) = 0, (13) 

k—>oo 

lim + ^ 2 * 2 "*"^ “ ^11 = 0) 

w k—¥co 

where d{-,-) denotes the Euclidean distance of some point to a set. 


Proof, (i) From the first order optimality condition of (4), we get 

foe de^{x\+^) + 4) - + pAj{A^ x\+^ + - fe) + Ri{x\+^ - x\), 

1 0 G de2{xl+^) + V2,2fi{x\'^^,xl+^) -AJp^+ fiAl{A^ x\+^ + .42®^+^ -b)+ R2{xl+^ - x^), 


where </)(•, •) is defined in (6). Using the definitions of fi and the above formulas imply that 

r -Voo 2 fiix^^\x’^+^) + AJ+ {Hi 2 + fiAjA 2 )ix’^+^ - x^) - Ri{x'l+^ - x'l) G dei{x'l+^), 

S (14) 

i -V 2 = 2 ^{x’ 1 +\x’^+^) + aJp'^+^ - R 2 {x^+^ - X^) G de 2 {xl+^). 

Since (ii, * 2 , fi) is a KKT point of (7), we have that 

' -Vxifi{xi,X 2 ) A Ajfi£ d9i{xi), 

< -Vx 2 fi{xi,X 2 )AA 2 fi^d 62 {x 2 ), (15) 

_ AlX\ + ^2*2 = b. 


From (11), (14) and (15), we obtain 

II fe+l -||2 

\\x - x\\s 

< {x\+^ - ®i)^{ [ - ^X2fi{x\'^\x't^) + aJp^+^ + (i7i2 + PAJA2){xI+^ - x^) - Ri{x\+^ - 4)] 

- [ - Vx2fi{xi,x2) + Ajfi]} + (4+^ - S2)^{ [ - Vx,<(.(4+\ 4+1) + aJ 4+1 - i?2(4^i - 4)] 

- Vx2fi(xi,X2) + Alfi]^ 

= —“ xi)^ aJ{ fi — fi*+i) — (*4^ “ X2)^A 2 {fi — ft^+i) — (a;*+i — S)^i?(a:^+i — a;*) 

+(* 1 +^ - xi)^{Hi2 + PAIA2){x2'^^ - X 2 ) - (a;^+i - i)+(V())(a;4^:“ '^fi{xi,X2)) 

= (a:^+i - x^)'^R{x - x^^^) + ^(4'''^ - 4)^4 - + (*4^ “ ii)^(77i2 + /3A7^2)(a:4^ “ * 2 ) 


(16) 
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By simple manipulations anu using _ 

- xi) 

= -j3{A2X2'^^ - A2X2)^ {A2X2'^^ - A 2 X 2 ) + fi{A\x\'^^ + A2X2'^^ - b)^(A2X2^^ - ^ 20 : 2 ) 
= ^(11-42X2 - ^2X2||^ - ||A2X2'''^ - A 2 X 2 IP) - ^P2X2^^ - ^ 2 X 2 !!^ 

+f}{Alx\'^^ + ^ 2 X 2 "'"^ - h)^ {A 2 X 2 '^^ - A 2 X 2 ), 


(17) 


- x'')^i?(x - x^^^) = -{\\x^ - x\\\- \\x^^^ - x\\% - ||x''+^ - x'^lll) 


and 




k\T f- fe+l\ 1 /II k -||2 II fc+1 -||2 II k+1 fc||2\ 

-M ) (m-m ^ = “^11 “11^ “^11 “11^ II ^■ 


(18) 

(19) 


On the other hand, it follows from (14) that 

■-V//,<^(xt+\x^+^) + - i?2(x^+^ - x^) G 9e2(x^+i), 

— V£C2((>(Xi, X 2 ) + A^ 7*^ “ R2{x2 — X 2 G dd2(x2), 

which, together with (9), implies 

(x^"*"^ - X2)^ [ - Vx2<P(Xi~*~^, X2'''^) + AJ/I^'^^ - 7?2(x2^^ - X 2 ) + ^x^4’(x\,X2) - + i?2(x2 “ Xa”^)] 

> lIxs+^-x^lll,. (20) 

Recall that 

=-/3(^ixt''"^ + ^ 2 X 2 '''^ - &) and Vx 2 <l>{xi,X 2 ) = HJ 2 X 1 + H 22 X 2 + 92- 
Then, by using Cauchy-Schwarz inequality, the inequality (20) gives 
l3{Aix\'^^ + A 2 X 2 '^^ - b)^{A 2 X 2 '^^ - A 2 X 2 ) 

^ II fc+1 fc||2 I / fc+1 fc\T rrT ( k fc+l\ II fc+1 fc||2 I / fc+1 k\T r> z' ^ fc—1\ 

< -||X2 - ^2\\h22+S2 + {^2 -^ 2 ) 7il2(Xl - Xi ) - ||X2^ -X2||_R2+(X2 “ X2) R2{X2 - X 2 ) 


_ II fc+l fc||2 I / fc+1 fc\T o-T / fc fc+l\ 4 II fc+1 fc||2 I 1 II fc fc— 1||2 

<-||x2 - X2 1 + 22 + 1:2 + (®2 -®2) -ffl2(xi -X+ ) - -||X2^ “ *2 I +2 + 2 II *2 “ X 2 1+2- 


Substituting (17), (18), (19) and the above inequality into (16), we further get 

.fc ^||2 ll^fc+1 _ ^||2^ ^^p^^fc _ ^^-^||2 

.k+1 _ a„;.i\ 2\ , ^(uk _k-lu2_^ _llk+l „fc||2 


I fc -||2 II fc+1 -||2 \ I 1 

\x - x\\r -\\^ - x|+j + 


IlM 


1 


213' 

-1+2X2 + ^ - ^2X2|+ + +||X2 - X2~^\\r^ - - X 2 IIH 2 ) 


^ II fc+1 -||2 I 1 ,, 

> l|x ^ - x\\h+s + 2IIX 


2 

fc +1 


fc||2 I 1 II fc+1 

-X Wr + ^Wh 


fcii2 




+ ■ 


1 , 


fc +1 


fcii2 


^2 ^‘^\\0AJA2 


fcii2 


/ K+l K\ \ rr I / rC — \ 1 II rC +1 K 

-(X2 -X2) 77 i 2 (Xi -Xl) + ||X2^ -X 2 11^22 + 1:2 

Moreover, it follows from Cauchy-Schwarz inequality and 77 -|- 17 ^ 0 that 

/ fc+1 fc\T t^T / fc - \ II fc+1 fc||2 

(X 2 -xa) 77 i2(xi- xi) - ||X 2 ^ -xa 1 + 22 + 1:2 

= {x+^ - X2)^77i'2(x^ - Xi) -h (x^"*"^ - X 2 )^{H 22 + 2:2)(x2 - xa) - (xa"*"^ - x^)^{H 22 + i 72 )(x 2 '''^ - Xa) 


( 21 ) 


0 

LX+i-x^ 


n T 


(77 + r)(x'= - x) - {xl+^ - xg) ' (7722 + 2;2 )(x^+" - xa) 


fc +1 


fcN+Z 


fc + 1 


, 3 II fc _||2 , 1 II fc+1 fc||2 1 II fc+1 fc||2 

< 4 l|x -x|++i;-h-||X 2 ^ - X 21 + 22 + 1:2 - 211*2 -X 2 |+ 22 +i :2 

1 /|l fc - ||2 II fc+1 - ||2 \ 

+ -(I|X2 - x2\\h22+i:2 ~ 11*2 - x2|+22+i:2) 


2 ' 

3 II fc -||2 J- II fc+1 

= 4 l|x -x|++i; - -||X 2 ^ 


fc||2 , /II fc - ||2 II fc+1 - ||2 \ 

- x2|+22+i: 2 + ^(11*2 - *21+22+1:2 - 11*2 - *2|+22+i:2j- 


(22) 
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Using the elementary inequality 2(||a|||^_^^ 




II* 


k+l -||2 3 fe _||2 

^ -x\\h+s - 4 II* -*llff+i: 


7 fe+l -||2 II k -||2 

= o(ll* -*llff+i:-|l* -x\\h+e 


') +-^(ll*^'^^-*llff+i: + ll*^-*llH+i:) 


> Idl*^"^^-*llff+i: .. 

Substituting (22) and (23) into (21), we get (12). 
(ii) From (12), we can immediately see that 

“ x'^Wh+e+sr + ^ 

k=l 


II k -||2 \ I 

- Ilx -x\\h^e) + 


1 II fc+1 k\\2 
- Lr ' _ .V. II _ 


16' 


* llff+i:- 


(23) 


1 II fc+l fc||2 I 1 II fe+l fc||2\ , 

gll*2 *2||//22+£'2+3/3AJA2 + ^11/^ M II ) < 


oo, 


and it 


and 


fe=i 

therefore holds that 

lim 112;'=+^ - x^\\h+e+8R = 0, lim |l4+^ - *2lli/22+i:2+3/3A*A2 

k—^oo k—^oo ^ 1 +• 2 


= 0 


Since H + 


lim + ^ 2 + 2 "*"^ “ ^11 = 1™ 4ll/^^''"^ “ 

k—¥co k —^00 jj 

deduce from (25) that 


k—¥oo k—¥oo 

E, R and H 22 + E 2 are positive semidefinite matrices, we 

lim {H + E) - x‘ 

C^OO 

lim R — x^) = 0, 

€—>■00 


■>^+^ -x'^) = 0, 


fe—>-oo 

lim 11x2"^^ - X2\\h22+E2 = 0 , 

k—yoo 

lim ||^2(*2'''’^ - *2)11 = 0 , 

k—^OD 


and hence 

lim {Hii + 
k—¥oo 

, we have 


Using the triangle inequality,' 


Ei){x'l+^ - x\) + Hi 2 {xl+^ - x^) : 


: 0 . 


xj+^ -xj 


and thus from (27), it follows 


< 


H-\-E 


— Xi 

*2 + '-*2j 


+ 


H+E 


X 2 X 2 


H+E 


lim ||x^ 

k—^oc 


fc+l fc|| ^ /II fc+l fc|| I II fc+l fc|| \ 

■- - * 11 + 11 + 1:1 < lim (llx ^ -X |++I; + ||X 2 ^ -* 21 + 22 + 1 : 2 ) = 

fc —>00 


0 . 


fc—>-oo fc—>• 

From (27), (28) and the above formula, we obtain 

lim i?i(x+^ — xi) = 0 

fc—>-oo 


fc—>-oo 

lim i? 2 (* 2 ~*'^ — * 2 ) = 0) 

fc—>-oo 

= - lim (i?ii + Ei){x’l+^ - X?) = 0, 

fc—>-oo 


I ft/-^LXJ 

lim 77 i2{x+^ —* 2 ) = — Im 

k—^oo fc—>•< 

lim A 2 (x 2 ~*'^ — X 2 ) = 0. 

' k—^oo 

the assertion (13). □ 


This, together with (14) and (26), proves 1 

To establish the convergence of ADMM, we make the following assumption: 


Assumption 1 VTe assume 


Tfii 0 
0 H 22 


+ 


El 0 

0 1:2 


+ 


aJAi 0 

0 aJa2 


+ 


Ri 0 
0 i?2 


+ 0 . 


(24) 

(25) 

(26) 


(27) 


(28) 


(29) 


(30) 
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It is worth emphasizing that Assumption 1 means that the subproblems of 2-block proximal ADMM admit unique 
solutions, because Assumption 1 holds if and only if 


'Hii 

0 

+ 

'Ui 0 ■ 


'aJAi 0 

0 

H 22 

0 Us 

0 aJa 2 


Ri 0 
0 i?2 


^ 0 


for any /I > 0. However, the optimal solution to original problem (7) is not necessarily unique. 

We are now ready to prove the iterate convergence of the 2-block proximal ADMM for the nonseparable convex 
optimization model (7). 


Theorem 1 Suppose Assumption 1 holds. Let {{* 1 , *21 generated by the proximal ADMM (4) with n = 2 to 

solve problem (7). Then the sequence {{xi, X 2 , M^)} converges to a KKTpoint of (7). 


Proof. It follows from (12) that the sequences {{H + E-\-R)x^'^^'\, {{H 22 + T ]2 + PA^A 2 -\-R 2 )x 2 ^^} and are 

all bounded. Since H 22 + D 2 + l3A^ A 2 + R 2 is positive definite, we know {* 2 ^^} is bounded. Note that A 1 X 1 +A 2 X 2 = 
b. Using the triangle inequality 

\\Ai{x\'^’^ - ii)|| < WAix’l'^^ + ^20:2“'"^ - {Aixi -h ^2*2)11 + 11^2(0:2''"^ - *2)11 

= ||Aia:5;'+^ + A2X2'^^ - fo|| + ||A2(a:2^^ - X2)\\ 


and 


fe-l-i 




< 




'^1 


- Xl 


- Xl 

- X2 


H+E+R 


+ 


H+S+R 


y.k +1 


- X2 


H+S+R 


- x\\h + E + R + 113 : 2 '''^ - X2\\h22+E2 + R2^ 


we further obtain the boundedness of the sequences {Aix^~^^} and {{Hu -|- Ei and hence {{Hu + Ei + 

jdAjAi + is bounded. Together with the positive definiteness of Hu -|- Ui -|- pAjAi -|- Ri, this implies 

the boundedness of Thus, the sequence {{x^, X 2 , /i^)} is bounded and there exists a triple (x“, x^, M°°) and a 

subsequence {k^} such that 

T ki 00 !• ki 00 !• ki 00 

lim x-^ = Xl , lim X 2 = X 2 and lim ^ ^ . 

i—foc 2^00 2^00 


Setting k = ki — 1 and invoking the upper semicontinuity of dOi and 882 in (13), we then obtain 

'-V.,<(.(xf ,x?^) + G 90i(xf), 

‘ —^X24’{xi°,x'^)+A2P°°(zd92{x^'), 

, Aixf-h A 2 X^ - fo = 0, 


which means (x5“, x“, /i°°) is a KKT point of (7). Hence (12) is also valid if (xi, X 2 , m) is replaced by (x^, x§“, /x°°). 
Therefore, it holds for any k > ki that 


‘ II fc+1 II2 

gIF \\h-{-e+^r -^2 

^ ^ w ki 00 II 2 
< - \\X — X 


I ^11 k-\-l ooii2 I 1 II fe+1 ooi|2 I 1 II fc+1 k \\2 

IR+oIF2 ~ \\h22+^2-\-I3AJA2~^ II “^2||r2 

ki-l \\2 


1 I ^ It ki ooii2 I 1 II Alt ooii2 , 1 It ki Ki — Lu'Z /oi\ 

//+i:+|R+ 211 ^^ \\h 22 -\-^ 2 -\-/ 3 aJA 2 ~^ 2 / 3 ''^ II + 211^2 ~ ^2 \\r 2 - (^ 1 ) 


It follows from (29) that 


lim 11 x 2 “^ - ^ 2 ||r 2 = 0- 

k—foc 


Note that 


lim (j 


I ^ II ki 00II 2 

4R " 011^2 ^2 


lH+i;+^R -r 2 


R22-l--^2-h/3A2~ A 2 2^ 


\\f^ - ^ 


I I ft. 

+ 211*2 -*2 


Ilit2 


)=o. 
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and so we can deduce from (31) that 

1' / ^ II k-\-l c>0||2 I II /c+1 ooi|2 I 

\\H+S+iR+2\\^2 -X2 \\h,M+PA^A.+ 

which implies 


7,, fe+l oO ||2 , 1 II fe +1 oO ||2 , 1 II fc +1 oO ||2 , 1 II fe +1 fc ||2 


2 P' 


I - II fe +1 fe II 2 \ n 

+ 2 ll ®2 -^ 2 \\r^) = 0 , 


T II fe+1 C>0||2 n T fe+1 OO 

^Inn 11*2 - *2 A.+R, =0. =R 


and 


!• II fe+1 OO ||2 r\ 

hm II* ^ -* \\h+s+r = 0. 

fe—>-oo 


Since H 22 + ^2 + R 2 is positive definite, we obtain 

!• fe+1 OO 

lim X2 = X2 ■ 
k —^00 

On the other hand, by (13) and (33), it can easily be seen that 

||Ai(*+^ - *?°)|| < ||Ai*+^ + ^2*+^ - (Ai*“ + A2*“)|| + ||A2(*+^ - *“)|| 
= ||Ai*+^ + A2*+^ - foil + 11 ^ 2 ( 3 : 2 "''^ - 3:2°)|| -S' 0, 

as fc —>■ OO. Then, we obtain 

II fe+1 C)0||2 II fe+1 C)0||2 I o\\ A ( fe+1 oo\||2 

II ®1 -*1 \\h,,+s,+paJa,+r, = \\^i \\h,,+s,+r, + PWAiix^^ -*1 )|| 

+ /3Pi(*+i-*r)f 


(32) 


(33) 


(34) 


*+i _*~ 


0 


H+S+R 


r*+^ -*f 

+ 

0 1 

_fe+i ^00 


L ^2 ^2 

H+E+R 

X 2 ^2 \ 


< 


= ( 11 *'=+^ 


/fe+U+i? 
2 


fe+1 oo\ ||2 


+ ^Pi(*r -sr? 




00|| 1 II ft+i 00|| \2 , on A ( K+i o< 

-a; \\ h + i;+r + \\ x 2 -*21+22+1:2+^2) +Pl+i(a;i - asi 

where “<” follows the triangle inequality of norms. Together with (32), (33), (34), and the positive definiteness of 


ffii + 271 + PA-l Ai + i?i, this shows that 


I- fe+1 OO 

lim Xi = Xi . 


fe— 


Therefore, we have shown that the whole sequence {(* 1 , X 2 ,r^)} converges to (*J“, x'^, rA°), which is a KKT point 
of (7). This comletes the proof. □ 

Remark 1 In fact, the iterate convergence of 2-block proximal ADMM can also be guaranteed if there is a fixed stepsize 
7 £ (0,(1 + \/5)/2) in the dual update. Namely, the proximal ADMM can be extended as follows: 


* 


fe +1 


fe +1 

*2 ■“ 


:= argmin|£^(*i,*2;+) + ^\\xi -Xi\\jiA, 

Z J 

:= argmin|f:^(*+\*2;+)) + i||*2 -X2\\%,A, 


(35) 


£C2GR32 

_ ++1 := + - 7/3(Ai*+i + + 20 : 2 +^ - fo) 


where /3 > 0 and 7 £ (0, (1 + %/5)/2). Under the conditions of Theorem 7, we can similarly prove the global iterate 
convergence of (35). For brevity, we omit the details here. 
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Remark 2 The proximal ADMM includes the ADMM and its linearized version as special cases. When = 0 and 
i ?2 = 0, the proximal ADMM reduces to the ADMM and, according to Theorem 1, its convergence can be established 
under the condition that 


'7711 

0 

+ 

■ 7:1 

0 

+ 

'A^Ai 0 

0 

7722 

0 

7:2 J 

0 aJ A 2 


The ADMM can be easily applied to the convex minimization problems where 9i (i = 1, 2) have closed form proximal 
operators and all the matrices H 22 , Ai, A 2 are diagonal. Otherwise, we consider the linearized ADMM: 

^fe+ 1/2 _ _ pj!X[A 2 X 2 -b) - H 12 X 2 - gi]/ri 

x\'^^ := argmin 611 ( 2 : 1 ) + ^|| 2 ;i - , 

XiGK'^i ^ 

‘ ■.= [{r 2 l - H 22 - I3A2A2 )x2 - PA 2 {A]_x\'^^ -h) - h'( 2 x\'^^ - g 2 \/r 2 

x\'^^ := argmin 62 ( 2 : 2 ) + '^\\x 2 — , 

a;2GR‘^2 ^ 

. := - P[Aix\'^^ + ^ 22 : 3 ^^ - 6 ), 

which is equivalent to the proximal ADMM with 7?i = ril — Hu — fiA^A\ and R 2 = r 2 l — H 22 — PA^A 2 . Thus the 
iterate convergence of linearized ADMM can be guaranteed under the condition that 

ri> max + PAJAi), i = l, 2 , 

l<2<di 

where \j (•) represents the jth eigenvalue of a matrix. 

By using the following proposition (see [16, Lemma 1.1] and [34, Lemma 3]), we can deliver a o(l/A:) convergence 
rate of the proximal ADMM, measured by the square of KKT violation. 

Proposition 1 For any sequence { 0 ^} C IR satisfying > 0 and X^^i < + 00 , it holds that mini<j<fc{aj} = 
o(l/A:). 

Theorem 2 Suppose Assumption 1 holds. Let {{xi, X 2 , g^)} be generated by the proximal ADMM (4) with n = 2 to 
solve problem (7). Then, we have 

min |d^( 0 , + \7xiP{x\'^^, x^ 2 ^^) - aJ(O, de 2 {x^ 2 ^^) + VX 2 <l>{x\'^^- aJ g'''^^) 

l<i<k L 

+ ||Ai 2 : 1 +i + A 2 xI+^ - bf^ = o{l/k). (36) 

Proof. From (14) and (4), we obtain 

' -Ri{x’l+^ - x\) + (7712 + PAJA 2 ){xI+^ - xl) £ 961 ( 2 :^+^) + VxX{x't^,xl+^) - aJ g^+^, 

, -R2 {xI+^ - xl) £ de2{xl+^) + VxX{x\'^^- aJ g^+^, 

II . fc+l , j fc+l ,||2 1 II fc+1 fc||2 

||Ai 2 :i^ +^ 22 : 3 ^ - 6 || =^IIm - F W ■ 

By using the Cauchy-Schwarz inequality and the above formulas, we obtain 

d^( 0 , dei{x'l+^) + - Ajg'‘+^) + d^{0, 962 ( 2 :^+^) + VxX{4^\4 '^^) - aJ g’‘+^) 

+ \\Aix^^+^ + A2 x’^+^ - bf 

< 2||7?i(xJ+i - x^i)f + 2||(77i2 + /3AJ A 2 )(x^+^ - x^)f + ||7?2(4+^ - x^)f + - gY 

< 2\\rI f\\x’l+^ - x^ifn^ + (2||77i2 + PAJ A2f + ||7?2f )||4+^ - x^f + 


( 37 ) 
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It follows from (24) that 

' oo 

E ll fc +1 fc ||2 , 

Ikl -^iWrj < OO, 

k=l 

oo 

< - X2\\h^^+S2 + AJA2 + R2 ^ 

k=l 

oo 

E ll k-\-l k\\2 , 

ll/i ^ M II < oo, 

. k=l 

Since H 22 + S 2 + AJ A 2 + R 2 ^ 0, from (38) we have 

OO 

11 * 2 ^^ — *2 11 ^ < 00 . (39) 

k=l 

Combining Proposition 1 with the relationships (38) and (39), we have 

{2||7?ff ||a;i+^ - x\\\\^ + (21|Jfm + (3^7-421|^ + II-R 2 11^)114'^^ - + ^11/^*"^^ - | = o(l/fc), 

which, together with (37), implies (36). We complete the proof. □ 

We remark that, in some sense, Assumption 1 actually acts as the weakest condition to guarantee the iterate con¬ 
vergence of the proximal ADMM for solving problem (7). Firstly, if Assumption 1 is violated, the solution sets of 
subproblems in (4) might be empty, in which case the 2-block proximal ADMM scheme is not well defined (see [12] for 
an illustration). Secondly, the following corollary shows that Assumption 1 is not only sufficient, but also necessary for 
the iterate convergence of the 2-block proximal ADMM for solving the coupled quadratic minimization problem. Thus, 
the conditions we proposed are already tight. 

Corollary 1 Assume problem (7) is a convex quadratic programming problem, that is 6 i{x\) = 0 and 62 {x 2 ) = 0. 
Then, any sequence generated by the 2-block proximal ADMM is convergent if and only if Assumption 1 holds. 


Proof. The “if’ part follows immediately from Theorem 1. For the “only if’ part, we prove that if Assumption 1 
fails to hold, there must exist some sequence generated by the 2-block proximal ADMM that is divergent. Indeed, let 
{{x\, X 2 , p-^)} be a sequence generated by the 2-block proximal ADMM, i.e.. 


fc-j-l _ ■(/'/' I 1 II fc ||2 I 

X.J £ &t:gmiVLlCp{xi,X2-,p ) + -^\\xi - C 

XiSR'^i ^ / J 

< k+1 ^ •(/'/' fe-l-1 1 1 II fe||2 1 

*2 e argminl ,X 2 \p ) +-;t\\x 2 - X 2 \\r^\ , 

a;2gK<i2 *■ Z ) 


If the sequence is divergent, then the “only if” part of this corollary holds. Thus we need only consider the case where 
{{x\,X 2 , p^)} converges. Because Hr -|- PAJAi A Ri {i = 1, 2) are not positive definite, there exists a nonzero vector 
{yi,y 2 ) such that 

{HR + pAjA, + R,)y, = 0 Vi = 1,2, 


or equivalently. 


R-nyi O 5 Aj^yi 0 and R^yi 0 Vi 


Using the fact that Q < H <2 


Hu 0 
0 H 22 


, we have Hy = 0. Hence, it holds that 


(41) 


Hi 2 y 2 = 0 and 


ffmVi = 0. 


( 42 ) 


By (40), (41) and (42), it can easily be seen that, for any fc > 1, 

( 2 k , - ^ ■ (r f 2 k-l 2 k-l\ , 1 n 2 fe-l ||2 1 

xi -h j/i £ argmm|£^(a;i,a ;2 ; p ) + it\\xi-x, ||^ I, 

Xi 

< 2 k , - ^ . f ^ / 2 fc , - 2 fc-l\ , 1 II 2 fc-l ||2 1 

X2 -l-j/2 e argmmiz:^(a;i +yi,X2;p ) +-^11x2 - X 2 C 

X2 ^ ^ ^ 

. P^^ = P^^ ^ ~ P{Ai{xi^ + yi) + A 2 (a: 2 * + i/ 2 ) — &) 
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and 

£ argmin|£^(a:i,a;i'' + + ^\\xi - (a:?'' + |, 

*2^+^ G argmin|£^(a;i''+\a;2;p^'') + i||2;2 - (* 2 ^ + y2)\\R^Y 
^2k+l ^ ^2r_ ^ ^^^2k+l _ ^ 

This means that the divergent sequence {xI,X 2 ,r^) —>■ (a;f+yi, xi+y 2 , —>■ {xi,X 2 ,r^) —>■ {a;^+yi, 2 ; 2 +y 2 , —>■ 

... could be generated by the 2-block proximal ADMM. Thus, Assumption 1 is also necessary for the iterate convergence. 
This completes the proof. □ 

When restricted to the case that Ai{i = 1,2) and b are absent, the 2-block proximal ADMM reduces to the 2- 
block cyclic proximal BCD method. Our analysis of proximal ADMM provides an iterate convergence result for the 
2-block cyclic proximal BCD method without assuming the boundedness of the iterates, but only requiring a condition 
to ensure the uniqueness of the subproblem solutions. This result is an important supplement to traditional studies on 
BCD, which have mainly focused on subsequence convergence and the complexity of the function values, and enables a 
better understanding of the performance of this method. 

Corollary 2 Assume -|- -|- 0 . Let {{x\^x^)} be generated by the cyclic proximal BCD (5) with n = 2 to 

solve the following unconstrained optimization problem: 

min Oi{xi) A d 2 {x 2 ) a\x"^H x A X. (43) 

a;eK‘* 2 

Then the whole sequence {{xi, X 2 )} converges to an optimal solution of (43). 


Remark 3 Similar to the proximal ADMM, the proximal BCD includes BCD and its linearized version (also know as 
BCPG) as special cases. When = 0 and R 2 = 0, the proximal BCD reduces to BCD and, according to Theorem 1, 
its convergence can be established under the condition that 


■pfll 

0 

A 

■^1 

0 ■ 

0 

H 22 

0 

^2 


^ 0 


The BCPG is a combination of the proximal gradient method and BCD, which can be easily implemented when 9i have 
closed-form proximal operators. Specihcally, it takes the form that 

^fe+ 1/2 _ _ H 12 X 2 - gi]/r\ 

x\'^^ := argmin 0 i(a;i) + ^\\x\ - , 

^ XiGK‘^1 ^ 

^fe-l-1/2 _ _ ^ 2 ] /r2 

fe-l-1 ■ o / ^ 1 ’'2 II fe-|-l/2||2 

X 2 := argmin02(a;2) +-;^||2:2 — *2 II > 

. a:2GR‘^2 ^ 


which is equivalent to the proximal BCD with Ri = ril — Hu and R 2 = r 2 l — H 22 - Thus the iterate convergence of 
linearized ADMM can be guranteed under the condition that 


fi > max i = 1 , 2 . 

l<3<di 


3 Convergence of Mnlti-block RPADMM and RPBCD 

As shown in [10], the convergence result for 2-block ADMM obtained in the previous section cannot be extended to the 
multi-block case, i.e., n > 3. To remove the possibility of divergence, we use randomly permuted ADMM (RPADMM) 
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to solve the nonseparable optimization problem (1). Specifically, RPADMM first picks a permutation ct of {1,... ,n} 
uniformly at random, and then iterates as follows: 


k+l 

'cr(l) 

■= argminL^(a:c,(i), 

k 

k . 1 

■ • ■ 7 ^cr(n) 7 M 




.fc+1 

'a{2) 

:= argminL^(a;^t^ 

^cr(2) 7 





'cr(n) 

:= argmin£^(a:^tl 

fc+1 

’ ^(7(2)’ 

fc+1 

• • • ’ ^cr(n-l) 

k+l 

n 

■= - b), 



i=l 


(44) 


where the permuted augmented Lagrangian function -C/3(a;cr(i)i ®ct( 2 )i • • • > m) is defined by 

■^/3(^(t(1)5 ^<t(2) ! • ■ ■ 5 ^(7(n) ? m) ■ ^ ft)- 


It has been shown [48] that RPADMM is convergent in expectation for solving the nonsingular square system of 
linear equations. To extend their result to the nonseparable convex optimization model (1), it is natural to first study 
whether RPADMM is even convergent in expectation for solving the following simpler linearly constrained quadratic 
minimization problem 


min 9{x) := Hx + x 

n 

S.t. ^ ^ 

i=l 


(45) 


where H can be partitioned into n x n blocks H^j G (1 < i^j < n) accordingly. In this section, we provide an 

affirmative answer to the above question under the following assumption. 


Assumption 2 Assume 


'Hii 

0 • 

• 0 


'aJAi 0 ■ 

0 ■ 

0 

H22 ■ 

■ 0 

+ 

0 aJa 2 -- 

0 

0 

0 • 

■ Hjin _ 


_ 0 0 ■ 

■ 


Although our current result is restricted for nonseparable quadratic minimization, a special case of (1), it serves as a good 
indicator of the expected convergence of RPADMM in more general cases. It is noteworthy that our result is a non-trivial 
extension of the result in [48], because, in our setting, the problem under consideration is more general. For example, 
the optimal solution set of (45) is not necessarily a singleton, in which case the spectral radius of the algorithm mapping 
may not be strictly less than 1, although this fact played a key role in establishing their result. 


3.1 Proof Outline and Preliminaries 


For convenience, we follow the notation in [48], and describe the iterative scheme of RPADMM in a matrix form. Let 
La G be an n X n block matrix defined by 


{L. 


cr)a{i),a{j) 


_/ ^a{i)a 

■“ 10 , 


U) 4" if f > j, 

Otherwise, 


and Ra be defined as 

Ra ■■= La-{H + /3A^A) ■= La - S. (46) 

By setting 2 := (a:, g), the randomly permuted ADMM can be viewed as a fix point iteration 


■= Maz'" +LaH, 


( 47 ) 
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where 


-9 + PA^b 
Ph 

Define the matrix Q by 

n! ' 
o-GC 

and M by 

M ■= Ea{Ma) =-^Y Ala, 
n\ 

cr^r 


Adcr • — LfT Ra 


La : = 


La 0 

_ 

Ra A' 

_ 

PA I 

, Ra \ = 

0 I 

, b : = 


where F is the set of all permutations of {1, 2,, n}. By direct computation, we can easily see that 


(48) 


(49) 


M : = 


I-QS QA^ 

-PA + PAQS I - PAQA^ 


(50) 


To prove the expected convergence of the RPADMM (44) for problem (45) under Assumption 2, we will use a similar, 
but not identical, structure as that introduced in [48], which consists of the following main steps: 

(1) eig(QS) C [0,|); 

(2) For any eigenvalue A of M, eig((3S) C [O, implies that |A| < 1 or A = 1; 

(3) If 1 is an eigenvalue of M, then the eigenvalue 1 has a complete set of eigenvectors; 

(4) Items (2) and (3) imply the convergence in expectation of the RPADMM. 

To prove the above items, we need the following linear algebra lemmas, whose proofs can be found in the Appendix. 


Lemma 2 Suppose that Assumption 2 holds, S € is a symmetric matrix defined by (46) and Q is defined by (48). 

Then, the matrix Q is positive definite and all the eigenvalues ofQS lie in [O, i.e.. 


eig(Q5') C 



(51) 


Lemma 3 Let S and T be two symmetric positive semidefinite matrices in Then, there exists a polynomial p{x) 

such that 

det((A - ifl + (2A - I)^ + (A - l)r) = (A - 1) p(A) 

and p(l) > 0, where det(-) denotes the determinant of some matrix, I = 2d — Rank(S) — Rank(S + T) and Rank(-) 
denotes the rank of some matrix. 


Lemma 4 Suppose S is a symmetric 


matrix defined by (46) and P > Q, then 


Rank 


S 

PA 0 


Rank(S') + Rank(/3A^A). 


Here, Lemma 2, Step (1) of the proof structure, is an enhanced version of Lemma 2 in [48] that is compatible with 
problem (45). The proofs of Steps (2) and (3), which reveal the essential nature of this extension and are hence the key 
contributions here, will be presented in Subsection 3.2. The proof for Step (4) is given in Subsection 3.3. 
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3.2 Eigenvalues of the Expected Update Matrix 


One of the main differences between the nonsingular linear system case and that of the extended case is reflected in the 
following lemma, where 1 can be an eigenvalue of the expected update matrix M. 

Lemma 5 Suppose that Assumption 2 holds and S G is a symmetric matrix defined by (46). Let X be any eigen¬ 

value of M, then we have either |A| < 1 or A = 1. 


Proof. We introduce the following notation: 

, , Pu*A^Au 
j{u) = 


for all u G C” such that Su 0, 


u*Su 

where u* is the complex conjugate of u. Recalling that S = H A, we know 

0 < y{u) < 1 for all u G C" such that Su 0. 

Similarly, we define 

'^(w) = ^ ^ for all u G C” such that Su 0. 

^ ' u*Su ^ 

Note that eig(QS') < | by Lemma 2. Thus, we know that — S' ^ 0, and therefore 

1 4 

0 < u{u) ^ < - for all u G C" such that Su 0. 

O 

Note that M can be factorized as 


M = 

Switching the order of the products, we obtain a new matrix 
M' : = 


/ o' 

' I -QS QA^' 

-pA I 

0 I 


'I-QS QA^' 

I O' 


' I - QS - PQA^ A QA^' 

0 I 

-PA I 


-PA I 


Note that eig(M) = eig{M'). Thus, it suffices to show either p{M') < 1 or 1 is the eigenvalue of M'. 


Let A 


be an eigenpair of M', namely. 



Vi 

= X 

Vi 


V2 


V2 


which implies 


Equality (59) gives 


I-QS- fiQA^A QA^ 
-PA I 


{I-QS- PQA^A)vi + QA^V 2 = Aui; 

— PAvi + V 2 = Xv 2 - 


(1 — X)v2 = PAvi. 


Suppose X 1. Hence, it holds that 


V2 = 


1 - A 


-Av^. 


Clearly, this relation implies that vi 0. Substituting the above relation into (58), we have 

QSvi = (1 - A)ui + -^QA^Avi. 

1 — A 


(52) 

(53) 

(54) 

(55) 

(56) 

(57) 


(58) 

(59) 

(60) 
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Using the nonsingularity of Q, the above equality can be written as 

Svi = (1 - X)Q-\i + -^A^Avi. 

i — A 

Multiplying both sides of the above equality by w*, we arrive at 

= (1 — X)viQ~^vi + viA^Avi, (61) 

1 — A 

We claim that vJS'wi ^ 0. Otherwise, v\A^Av\ = 0 and therefore A = 1 from the inequality v\Q~^v\ > 0 and (61). 
This contradicts our assumption that X ^ Multiplying both sides of (61) by and substituting the definitions 

(52) and (54) into the above relation, we obtain the following key equality with respect to A 

1 = (1 - X)k{vi) + Y^7(vi), 


which can be further reformulated as 

K(tii)A^ — { 2 k ,{ vi ) — 7(i’i) — 1)A + k(ui) — 1 = 0. 
Because k{vi) is positive, we have 

A^ + + 1) - 2^ A + (^1 - = 0. 

The discriminant of the quadratic equation in (62) is 

A = ^ k ( iii )“^( 7 ( vi ) + 1 ) - 2 ^ - 4 ^1 - 
= (^«:(wi)“^(7(wi) + 1)^ - 47(wi)^ . 

Note that 


0 < 


47(^1) < 

(7(i;i) + l)2 - 


holds as a result of (53). Recalling (55), we consider the following two cases. 

Case 1: 0 < ■ This means the discriminant /i < 0, and the two solutions of (62) satisfy 


(62) 


(63) 


|Ai, 2 | = \/Xl * X 2 = \/l — k(vi) ^ < 1. 

Case 2: ^ k(vi)~^ < |. This means the discriminant Z\ > 0, and the two solutions are real. Let 

/(A) := A^ + (^k(wi)“^( 7 (wi) + 1) - 2^ A + (^1 - . 

By (53) and (55), we know that 

'.m) = ^>o, 

.^i +>2 = 2 - 1 |^g{- 2 , 2 ), 
which together with A 1, establishes that |A| < 1. 

Thus, it can be concluded that either A = 1 or |A| <1 holds. □ 

We now consider the case where M has an eigenvalue equal to 1 and show that it has a complete set of eigenvectors. 


Lemma 6 Suppose that Assumption 2 holds, and M G ]r("*+'^) ^ ("i+d) jg ^ matrix defined by (50). Suppose that 1 is an 
eigenvalue of M, then the algebraic multiplicity of 1 for M equals its geometric multiplicity. Namely, the eigenvalue 1 
has a complete set of eigenvectors. 
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Proof. By direct computation, it holds that 


det(AI — M) = det 


= det 


= det 


{A-1)/ + QS -QA^ 
PA-pAQS {\-l)I + pAQA^ 

{\-l)I + QS -QA^ ■ 

A^.4 (A-1)7 

(A-l)7 + gs+ -QA^ 
0 (A-1)7 


= (A - 1)™"'' det [(A - 1)^7 + (2A - 1)PQA^A + (A - l)g77] 


= (A - 1)™"'^ det [(A - 1)^7 + (2A - A^AQ^/'^ + (A - l)Q^/^77Q^/^j . 

This, together with Lemma 3, shows that the algebraic multiplicity of 1 for M equals 

m-d + 2d- Rank(g^/^^A^AQ^/^) - Rank(g^/^(/3A^A + 77)Q^/^) 

= m + d — Rank(/3A^A) — Rank(/3A^A + 77), (64) 

where the equality follows from Q 0 by Lemma 2. In addition, the geometric multiplicity of 1 for M is identical to 
the following quantity: 


m + d — Rank(7 — M) 


= m + d — Rank 


= m + d — Rank 


= m + d — Rank 


QS -QA^ 
PA - PAQS PAQA^ 

QS -QA^' 

PA 0 

S -A^' 

PA 0 \’ 


(65) 


where the second equality follows from the rank invariant property under elementary transformation, and the hnal 
equality holds because Q 0 by Lemma 2. Combining (64), (65), Lemma 4, and the dehnition of S, we derive the 
desired conclusion. □ 


3.3 Expected Convergence 


Step (4) can be formulated as the following theorem. 

Theorem 3 Assume Assumption 2 holds. Suppose RPADMM (44) is employed to solve the nonseparable quadratic 
programming (45). Then, the expected iterative sequence converges to some KKTpoint of (45). 

Proof. Let [x, p) be a KKT point of (45), i.e.. 


■ 77 

-A^' 


X 


-g 

PA 

0 






(66) 


Denote {x^ , by the kth iterate of the algorithm. It follows from (47) and (66) that 

Ea[x^'^^ — x; P^~^^ — p] = M77cr[x^ — X', p^ — p\. 

By Lemma 5, we know that p(A7) < 1. We proceed with the proof by considering the following two cases. 
Case 1: p(M) < 1. It holds that E^rX^ —>• x and Eap^ —>• p as fc —>• oo. Theorem 3 is valid. 
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Case 2: p(M) = 1. By Lemmas 5 and 6, we know that all eigenvalues of M with modulus 1 must be 1, which has a complete 
set of eigenvectors. As a result, M admits the following Jordan decomposition: 


M = P~^ 


1 


Pi * 


P, 


■ . * 


Pt J 


where P is a nonsingular matrix and \pi\ < 1 for all i = 1,..., f. It is easily verified that 

ri ^ 


Mk p-1 


1 

0 


P 


0 


as fc —>• 00 , and therefore the sequence — x\ — p]} converges to an eigenvector of M associated with 

the eigenvalue 1, say [ 2 :°; p°]. Then 

{I-M)[x°-p°]=0, 

which, after some manipulation, shows that 


■ H -A^' 



-1 

0 


-1 

0 


Therefore, Ex^ x + x^ and Pp^ —>• p + p° with 


■ H -A^l 


X + x^ 


-9 

0 


p -h p° 




(67) 


( 68 ) 


This means that {x + x°, p + p°) is a KKT point of (45). 


This completes the proof. □ 

One byproduct of Theorem 3 is the expected convergence result for RPBCD when applied to convex quadratic opti¬ 
mization. To the best of our knowledge, this is the first expected iterate convergence result of RPBCD. 


Corollary 3 Assume Ha >- Ofor i = 1,2,... ,n. If RPBCD is used to solve the unconstrained quadratic programming 
problem 


1 T o- I T 
mm —X Hx + g x, 

2 


(69) 


then the expected iterative sequence converges to an optimal solution of (69). 


3.4 Convergence Rate Comparison to Cyclic BCD 

There is a common perception that RPBCD dominates cyclic BCD in terms of performance (see [51], for example). In 
this subsection, we theoretically show that this is not generally true. Consider the quadratic programming problem (69), 
where x is split into two blocks {x\, X 2 ) with xi € and X 2 G and d = di + d 2 . Accordingly, we denote 

Hii Hi2 

HJ 2 H 22 


H = 
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By applying different minimizaing orders to the variables, the cyclic BCD (Gauss-Seidel method) has the following two 
iterative schemes: 


and 




Hii 0 

H 22 


b 


x^+^ = M 2 X^ 


Hii Hi2 
0 H 22 


where 


Ml = 


77ii 0 

-1 

0 -77i2' 

and M 2 = 

'77ii 7712 ' 

-1 

0 o' 

771^2 7722 


0 0 


0 7722 


-77^2 0 


(70) 


The asymptotic convergence rates of these two iterative schemes are p(Mi) and p(M 2 ), respectively. In this case, the 
expected asymptotic convergence rate of RPBCD is p((Mi +M 2 )/ 2 ). The following proposition reveals the relationship 
between these rates. 


Proposition 2 Suppose Hu >- 0 and H 22 >- 0. Let M\ and M 2 be defined by (70), and M 3 = (Mi + M 2 )/ 2 . Then, it 
holds that 

p(Mi) = p(M 2 ) < p{Mz). 

Proof. Without loss of generality, we need only consider the situation where Ha = for i = 1,2 and di > d 2 


because the similarity transformation M i-a PMP ^ does not change the spectrum of M, where P = 


this case, a simple calculation yields 


Ml = 


77A 0 ^ 

0 H|^^ 


. In 


0 -Mi 2 
0 771 ^ 2^^12 


and M 2 = 


7712771^2 0 

-77^2 0 


(71) 


Let ai > (T 2 > • • • > be the eigenvalues of H{ 2 Hi 2 - Recall that 77 ^ 0 and Ha = for i = 1, 2. Then, we have 
that CTi G [0,1], i = 1,..., d 2 , and obtain from (71) that 

p(Ml) = p(M 2 ) = (Tl. 

Clearly, 


M 3 = 


1 


77i277i'2 -77i2 
-7772 771^27712 


By direct computation, it holds that 



d 

■2A7 - 7712771^2 

77i2 

(sz 

1 det 

77^2 

2A7-771^27712. 


j (2A)'^^-‘^Met \a\^I - (4A + l)77i^277i2 + (77i^277i2)"] 
(i) (2A)^^-‘'^n(4A^-(4A + l)a, + af) 

' i=l 


<7i±,/(7i 


for i = 1, 2,..., d 2 - Because ai G [0,1], we 


and so the eigenvelues of M 3 are 0 (multiplicty = di — d 2 ) and —^ 
have that 

p{Mz) = 

This completes the proof. □ 

Therefore, although random permutation does indeed make multi-block ADMM and BCD more robust, especially for 
“bad” or diverging problems, cyclic ADMM or BCD may still perform well, or even better, for solving “nice” problems. 
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4 Concluding Remarks 

In this paper, we have demonstrated the point-wise or iterate convergence of the classical 2-block ADMM for solving 
convex optimization problems with coupled quadratic objective functions under a mild assumption. This assumption 
becomes necessary and sufficient for the global convergence of the ADMM when the objective is a quadratic func¬ 
tion. This result partially answers, in the affirmative, the open question arising in [31] on the convergence of ADMM 
for nonseparable optimization problems. We also derived the expected convergence of RPADMM in solving linearly 
constrained coupled quadratic optimization problems. This is a non-trivial extension of the convergence analysis given 
in [48], which is only applicable to nonsingular linear systems. When the linear constraint is absent, the proximal ADMM 
and RPADMM reduce to the cyclic proximal BCD and RPBCD. Thus, this study has provided new convergence results 
for BCD-type methods. In particular, we have established the first iterate convergence result for 2-block cyclic proximal 
BCD without assuming the boundedness of the iterates and the expected iterate convergence of RPBCD for multi-block 
convex quadratic optimization. We also theoretically demonstrated that RPBCD does not necessarily dominate cyclic 
BCD. Although the results for RPADMM and RPBCD are restricted to quadratic minimization models, they provide 
some interesting insights on the use of these methods: 1) random permutation makes multi-block ADMM and BCD more 
robust for multi-block convex minimization problems; 2) cyclic BCD may outperform RPBCD for “nice” problems, and 
therefore RPBCD should be applied with caution when solving general multi-block convex optimization problems. 

Two challenging open questions concern the extension of our convergence results for RPADMM and RPBCD to more 
general convex optimization problems, and an exploration of the global convergence rate of RPADMM and RPBCD. In 
particular, it would be interesting to know which problems are better suited to RPADMM or RPBCD. 
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Appendix. 


Appendix A. The proof of Lemma 2 is similar to, but not exactly the same as, that of [48, Lemma 2]. Since S is allowed 
to be singular here, we need also show the positive definiteness of Q by mathematical induction. For completeness, we 
will provide a concise proof here. Interested readers are referred to [48] for the motivation and other details of this proof. 

Proof of Lemma 2. This lemma reveals a linear algebra property, and is essentially not related with H, A and P if we 
define La directly by S. For brevity, we restate the main assertion to be proved as following: 


eig(QS') C 



(72) 


where S £ is positive semidefinite, Sa £ (j = n) is positive definite. 




i ^(7{i)a{j)^ if 

10 , Otherwise, 


r). ^ 


-1 
(7 ? 


(T^r 


(73) 


and L is a set consisting of all permutations of ( 1 ,..., n). 

Without loss of generality, we assume Su = Idi (* = Ij n)- Otherwise, we denote 


D := 




It is easy to verify that Q = D ^QD if 5 = DSD, and La and Q are defined by (73) with S. It holds that 
eig(Q5) = eig{D-^QD-^DSD) = eig{D-^QSD) = eig(Q5'), 


and Sii = Id^ {i = 1, ...,n). Due to the positive semi-definiteness of S, and by a slight abuse of the notation A, there 
exists A £ satisfying S = A^ A. Let Ai £ (i = n) be the column blocks of A, and it is clear that 

Sij = AJAj for all 1 < i, j < n. In addition, it also holds that eig(QS') = eig{AQA^). 

For the brevity of notation, we define the block permutation matrix Pf; as following: 


{Pk)ij 


7d., 

ifl<i = j<fc — 1 ; 

Ida 

iffe 4 -l<f = j + l<n; 

Ida 

if i = fe, 7 = n; 

^di X dj 7 

if 1 < j < fc - 1 , i ^ j; 

^di X djj^i 

iffe<j<n — 

^di X dk ’ 

otherwise. 


(74) 


It can be easily verified that and Pn = Id- For k £ (1,..., n), we define Pj, := {a' \ a is a permutation of (1,..., k— 

\,k + l,...,n)}. For any a' £ Pf^, we define La' £ as the following 




/ ^a' (i)a' (j) ^ if 1 ^ 7 ^ f ^ ^ I 5 

( 0 , Otherwise. 


We define Qk £ R("-‘iDx(n-dD 


Qk ■ 


— y 

lAi ^ 


CT'eTfc 


k = 1,n, 


(75) 


(76) 


and VFfe as the fe-th block-column of S excluding the block i.e. W/. = [S';;!,..., Moreover, let A)^ := 

[Ai,...,Afe_i,Afe+i,...,A„], wehave APfe = 

Now we use mathematical induction to prove this lemma. Firstly, the assertion (72) and Q 0 hold when n = 1, as 
QS = / in this case. Next, we will prove the lemma for any n > 2 given that the assertion (72) and Q 0 hold for n — 1. 
A key step of the proof is to reveal the following relationship. 

Q= -J^PkQkPk, 

k=\ 


(77) 
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where 


Qk • 


Qk 2 Qk^k 
-WlQk Id, 


in which is dehned by (76). The proof of (77) will be provided later. 

n 

It directly follows from (77) that AQA^ = ^ ^PkQkPk ^■ Consequently, 


fe=i 


(78) 


n n 

- y \min{APkQkPl A^) < XminiAQA^) < An.ax(^QA^) < - V Xm.^{APkQkPk A^). 

n. n 


(79) 


k=l 


k=l 


We will show, in the end of this proof, the fact that 


if it holds that 


eig(^Q„A^) C 



SiSiQnAn An) C 



(80) 


(81) 


In fact, (81) holds directly by the induction assumption. Together with the similarity among the blocks, the relationship 
(80) implies 


eig{APkQkPk A^) C 



for all fe = 1 ,..., n. 


(82) 


Substitute (82) into (79), we prove the assertion (72) for n, and hence complete the proof of Lemma 2. 

Our remaining task is to prove the relationships (77) and (80). We will achieve this goal by the following two steps. 
Step 1. Let a' G Lfc, we can partition L^/ as following 


L 


a' 


Z\\ Zxi 
Z2\ Z^l 


(83) 


Here the sizes of Z\\ and Z -22 are (di + ■ • • + df^-i) x (di + • • • + df^-i) and {dj^^i + • ■ • + dn) x {dj^^i + • ■ • + d„), 
respectively. The sizes of Z 12 and Z 21 can be determined accordingly. We denote 


Uk — {.A\,Aj^_i), 14 — {Ai^_^_i, An), 


which implies 


Wk = [Uk,VyAk = 


UjAk 

iVkAk 


It is then easy to verify that 


Acr',k) 


^11 ujAj. Z\2 
0 Id, 0 
^21 V)7.Z 22 


Left and right multiplying both sides of the above relationship by P^ and P^., respectively, we obtain 


pI L( 


\Pk = Pk 


^{(T',k)k^k 

Taking the inverse of both sides of (85), we obtain 

r — 1 - 

A<y’Ay 


Zii Zi 2 UjA/; 


^11 .^12 y ^/c 


1 - 1 

0 

0 

= 

Z21 Z22 V)J A). 

= 

_ .^21 Z22 14 Ak _ 


0 

0 



pI L7^ 


L-^ 

0 Id, 


(84) 


(85) 


( 86 ) 
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Summing up ( 86 ) for all a' G J). and dividing by |rj.|, we get 


1 


E 

cr'eTk 


(T £1 k (T Ei fc 

0 


Qk Qk ^^k 

. 0 /d. 


(87) 


Here, the last equality follows from (76). By the definition of La, it is easy to verify that lJ = La, where cr is a “reverse 
permutation” of a that satisfies (j{i) = a{n+ l — n). Thus we have L(^a',k) = where <j' is a reverse 

permutation of a'. Summing over all a', we get 


o-'GTfc rr'eTk o-'eTfe 


where the last equality follows from the fact that the summing over a' is the same as summing over cr'. Thus, we have 

/ \ T 




Qk 0 
[-W^Qkid, 


Here, the last equality uses the symmetry of Combining the above relation, (87) and the definition of Q^., we have 


^ Pk 


2 |rfc| 


? ( 

<7' GPk 


^{k,a') + ^{a',k)) 


Pk = 


Qk 2 Qk^^k 


l-WlQk Id, 


— Qk- 


( 88 ) 


Using the definition of and the fact that \rk\ = (n — 1)!, we can rewrite ( 88 ) as 

SkQkSj = y: ,)). 

^ <t'gA 

Summing up the above relation for k = 1,n and then dividing by n, we immediately arrive at (77). 

Step 2. For simplicity, we use W, Q and A to take the place Wn, Qn and An, respectively. 

By the induction assumption, we have Q 0, which implies & := W^QIV E 0. Recall that Snn = A^An = Id„, 
we have 

p(&) = max A^A^QAAnV < p{AQA) max H^nulli < tll^nllp = (89) 

uGK'*", ||i>|| = l uGK'*", ||ii|| = l o d 


Hence, we obtain 


0 ^ 6 > ^ -7d„ 


(90) 


Recall the definition (78), we have 


where J := 


Qn — 
Id-dn 0 

Id,. 


Id-d„ 


7d„ 



Q 

0 


\i-hw- 

= j 

Q o' 


.0 /d„ 

- IW^QW_ 


e 

0 

0 C 


(91) 


and C := 1^,. — \ W^QW. Apparently, we have C ^ 0. Together with Q ^ 0, it implies 


Qn >- 0. Thus, we directly obtain eig(AQnA ) C [0, 00 ). It remains to show 

p{AQnA ) < 2 ' 


(92) 


Denote B := A^A, then we can write S as 


B W 

Id,. 


s = = 
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We can reformulate p{AQnA^) as follows: 


p{AQnA^) =p{aJ 


Q 0 
0 C 


J^A^ ) = P 


g 0 
0 c 


A^AJ 


It is easy to verify that 


Thus, 


'Id-d„ 


B 

w' 


^d-dr, (5 


B - ^W' 

. 0 _ 



I 






^ := 


Q 0 
0 C 


J^A^AJ= 


QB-^QWW^ igw 
C 


(93) 


(94) 


According to (93), it suffices to prove p{Z) < Suppose A is an arbitrary eigenvalue of Z, and w £ is one of its 
associate eigenvector. In the rest, we only need to show 



holds. Then, using its arbitrariness, we have p{Z) < | which implies (92), and then (80) holds. 


Partition v into v = 


. where vi G 


■nd — dn 


VQ € - 


‘. Then, Zv = Xv implies that 



jQWW^ 


vi + ^QWvq = Xvi, 


+Cuo = Auo- 


(95) 


(96) 

(97) 


If XIii^ — C is singular, i.e. A is an eigenvalue of C. By the definition of C and (90), we have |/d„ -< C = — 

1 4 ^, which implies that A < 1, thus inequality (95) holds. In the following, we assume XI — C is nonsingular. An 
immediate consequence is ui 0 . 

By (97), we obtain vg = ^(XI^^ — C)~^ClV^vi. Substituting this explicit formula into (96), we obtain 

Aui = I^QB - ^QIVIV^^ VI + igW(A7d„ - C)~^ClV^vi = {QB + QW'I>W^)vi, (98) 

where <I> := —Id^ + A[(4A — 4)7;^^ + 0]~^. Since 6 > is a symmetric matrix, 0 is also symmetric. 

Suppose Amax(^) > 0, the definition of $ gives us 

e £ eig(e) -1 + e eig(<?). 

Together with Amax(^) > 0, there exists 9 £ eig( 6 >) such that — 1 + If A < 1, (95) already holds. Otherwise, 

A > 1, which implies 1 < ( 4 A- 4 )+e ^ 41 ^’ (95) holds. 

Now we assume Amax(?^) < 0, i.e. <7 ^ 0. By the induction, we have A := p{QB) = p{QA^A) C [O, |). 

Due to the positive definiteness of Q, there exists nonsingular U £ ]R(<^-<iri.)x((i-d„) q _ Lg( y ._ 

UWI>W^U^ £ -jg^(d-d^)x{d-dr,) ^ 

We have v^Yv = v^UW<PW^U^v = {W^v)^<I>{W^v) < 0 holds for all v £ where the last 

inequality follows from <7^0. Thus, F ^ 0. Pick up arbitrary g satisfying g > p(Y). Then, it holds that 

P{gld-d^ + Y)<g. (99) 

From (98), we can conclude that (g + A)ui = {QB + gW<7W^ + gld-dn)'di- Consequently, 

5 + A £ eig(gB + QWI’W^ + gid-dj = eig{UBU^ + UW<I>W^U^ + gid-dj, 
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which implies 

<? + A < piUBU^ +y + gl)< p{UBU^) + p{Y + gl) = \ + p{Y + gI)<X + g, (100) 

where the last inequality follows from (99). The relation (100) directly gives us that A < A < Namely, (95) also holds 
in this case. 

We have completed the proof. □ 


Appendix B. Proof of Lemma 3. For convenience, we use the notation 

g{\-,S,T) := det[(A- 1)^1+ (2A- 1)5'+ (A- 1)T]. 

We prove this lemma by mathematical induction on the dimension d. When d = 1, it is easily seen that 

r (A-l)°[(A-l)2 + (2A-l)5 + {A-l)r] if 5 7^0, 
g(A;5,r)=<^ (A - l)i(A - 1 + T) if5 = 0, T^O, 

[(A-1)2-1 ifS' = o, r = o, 

which means that Lemma 3 holds in this case. Suppose this lemma is valid for d < k—l. Consider the case where d = k. 
Case 1: 5 0. In this case, Rank(5) = Rank(5 + T) = k and then I = 0. Because 

g(A;5,r) = (A-l)'g(A;5,r) and g{l-S,T) = det{S) > 0, 

Lemma 3 holds in this case. 

Case 2: S y 0 but not positive definite. Let 5 admit the following eigenvalue decomposition 


ro 


B^SB = 


0 

Si 


:=D, 


St 


where P is a orthogonal matrix and Si > 0. If we let W = B^TB y 0, then 

giX-S,T) = g{X-,D,W). 


The proof proceeds by considering the following two subcases. 

Case 2.1: Wn = 0. Since W is positive semidefinite, then Wn = Wn = 0 for i = 1, 2,..., fc. Note that 

g(X-D,W) = (X-\fg(X-I)',W') 

where D' and W' are the submatrices of D and W obtained by deleting the first row and column. As we have 
assumed that Lemma 3 holds for d = fc — 1 , there exists a polynomial p{x) such that 


g(A; D, W)={X- 1)^(A - l)2fe-2-RankD'-Rank(D' + W)p(-^^^ 

Note that Rank(P') = Rank(P>) = Rank(5) and Rank(P' + W') = Rank(P + 1+) = Rank(5 + T). Thus, 
we have 

p(A' S T") (A Rank(S) —Rank(5'+T) 


which implies that Lemma 3 is true for d = fc in this subcase. 
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Case 2.2: Wn ^ 0. Without loss of generality, assume Wn = 1. Let = [14^12, ■ ■ ■, Wife]. By direct calculation, we 
obtain 

g{\- D, W) = [\- D', W') + (A - l)j;(A; D', W' - ww^). 

Since Rank(Z)' + W') < Rank(Z) + W) = Rank(S + T), there exists a polynomial pi(a:) such that 

g{X-D',W'} = (A - i) 2 fe- 2 -R'“k(S)-Rank(S+T)p^^^^^ 

where pi(l) > 0. On the other hand, since Rank(Z)' + W' — ww^) = Rank(I) + VK) — 1 = Rank{S + r) — 1, 
there exists a polynomial P 2 {x) such that 

g{X- D', W' - WW^) = (A - l) 2 fe-l-Rank(S)-Rank(S+T)p^^^^^ 

where p 2 (l) > 0. Therefore, 

g{X; S,T) = (A - l) 2 fe-Rank(S)-Rank(S+T)(^^(_^) 

and then Lemma 3 holds for this subcase. 

This completes the proof. □ 

Appendix C. Proof of Lemma 4. It is easily seen that 

Rank(S') + Rank{/3A^A) = Rank 

and therefore we need only prove that 


S 0 
0 


Rank 


S -A' 
PA 0 


= Rank 


S 0 
0 PAA^ 


( 101 ) 


Indeed, consider the following linear system 


■S -A^l 


X 

o 




= 0 , 


( 102 ) 


which is equivalent to 


It then holds that 


Sx — A' g = 0, 
Ax = 0. 


x^ Sx = x^ A^ g = {Ax)^ g = 0, 

and therefore Sx = Q and A^g = 0, because S = H PA^A is positive semidefinite. This means that 

= 0 . 


'S 0 


X 

0 PAA^ 




(103) 


On the other hand, it is not difficult to verify that any solution of (103) is the solution of (102), in other words, linear 
systems (102) and (103) are equivalent. As a result, the rank equality (101) holds, which completes the proof. □ 
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